Respondent-driven sampling bias induced by clustering and community structure in social networks

نویسندگان

  • Luis Enrique Correa da Rocha
  • Anna Ekeus Thorson
  • Renaud Lambiotte
  • Fredrik Liljeros
چکیده

Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted for the likelihood of being sampled due to differences in the number of contacts. In RDS, the structure of the social contacts thus defines the sampling process and affects its coverage, for instance by constraining the sampling within a sub-region of the network. In this paper we study the bias induced by network structures such as social triangles, community structure, and heterogeneities in the number of contacts, in the recruitment trees and in the RDS estimator. We simulate different scenarios of network structures and response-rates to study the potential biases one may expect in real settings. We find that the prevalence of the estimated variable is associated with the size of the network community to which the individual belongs. Furthermore, we observe that low-degree nodes may be under-sampled in certain situations if the sample and the network are of similar size. Finally, we also show that low response-rates lead to reasonably accurate average estimates of the prevalence but generate relatively large biases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

نمونه‌گیری پاسخگو محور در مقایسه با سایر روش‌های نمونه‌گیری از جوامع پنهان

Sampling hidden populations is challenging due to the lack of convenience statistical frames. Since most populations exposed to special diseases are hidden and hard to reach, sampling methods that produce representative and efficient samples from the populations have become a study subject for researches all over the world. Because of the unknown probability of selecting samples in conventional...

متن کامل

Respondent-Driven Sampling: An Assessment of Current Methodology.

Respondent-Driven Sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample.The current estimators of population averages make strong assumptions in o...

متن کامل

Detecting Overlapping Communities in Social Networks using Deep Learning

In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...

متن کامل

New Survey Questions and Estimators for Network Clustering with Respondent-Driven Sampling Data

Respondent-driven sampling (RDS) is a popular method for sampling hard-to-survey populations that leverages social network connections through peer recruitment. While RDS is most frequently applied to estimate the prevalence of infections and risk behaviors of interest to public health, like HIV/AIDS or condom use, it is rarely used to draw inferences about the structural properties of social n...

متن کامل

تشخیص اجتماعات ترکیبی در شبکه‌های اجتماعی

One of the great challenges in Social Network Analysis (SNA) is community detection. Community is a group of vertices which have high intra connections and sparse inter connections. Community detection or Clustering reveals community structure of social networks and hidden relationships among their constituents. By considering the increase of datasets related to social networks, we need scalabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1503.05826  شماره 

صفحات  -

تاریخ انتشار 2015